npj Precision Oncology
○ Springer Science and Business Media LLC
All preprints, ranked by how well they match npj Precision Oncology's content profile, based on 14 papers previously published here. The average preprint has a 0.11% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.
Park, Y.; Park, S.; Bae, E.
Show abstract
Despite continued advances in oncology, cancer remains a leading cause of global mortality, highlighting the need for diagnostic and prognostic tools that are both accurate and interpretable. Unimodal approaches often fail to capture the biological and clinical complexity of tumors. In this study, we present a suite of task-specific AI models that leverage CT imaging, multi-omics profiles, and structured clinical data to address distinct challenges in segmentation, classification, and prognosis. We developed three independent models across large public datasets. Task 1 applied a 3D U-Net to segment pancreatic tumors from CT scans, achieving a Dice Similarity Coefficient (DSC) of 0.7062. Task 2 employed a hierarchical ensemble of omics-based classifiers to distinguish tumor from normal tissue and classify six major cancer types with 98.67% accuracy. Task 3 benchmarked classical machine learning models on clinical data for prognosis prediction across three cancers (LIHC, KIRC, STAD), achieving strong performance (e.g., C-index of 0.820 in KIRC, AUC of 0.978 in LIHC). Across all tasks, explainable AI methods such as SHAP and attention-based visualization enabled transparent interpretation of model outputs. These results demonstrate the value of tailored, modality-aware models and underscore the clinical potential of applying such tailored AI systems for precision oncology. Technical FoundationsO_LISegmentation (Task 1): A custom 3D U-Net was trained using the Task07_Pancreas dataset from the Medical Segmentation Decathlon (MSD). CT images were preprocessed with MONAI-based pipelines, resampled to (64, 96, 96) voxels, and intensity-windowed to HU ranges of -100 to 240. C_LIO_LIClassification (Task 2): Multi-omics data from TCGA--including gene expression, methylation, miRNA, CNV, and mutation profiles--were log-transformed and normalized. Five modality-specific LightGBM classifiers generated meta-features for a late-fusion ensemble. Stratified 5-fold cross-validation was used for evaluation. C_LIO_LIPrognosis (Task 3): Clinical variables from TCGA were curated and imputed (median/mode), with high-missing-rate columns removed. Survival models (e.g., Cox-PH, Random Forest, XGBoost) were trained with early stopping. No omics or imaging data were used in this task. C_LIO_LIInterpretability: SHAP values were computed for all tree-based models, and attention-based overlays were used in imaging tasks to visualize salient regions. C_LI
McCaw, Z. R.; Shcherbina, A.; Shah, Y.; Huang, D.; Elliott, S.; Szabo, P. M.; Dulken, B.; Holland, S.; Tagari, P.; Light, D.; Koller, D.; Probert, C.
Show abstract
Current predictive biomarkers generally leverage technologies such as immunohis-tochemistry or genetic analysis, which may require specialized equipment, be time-intensive to deploy, or incur human error. In this paper, we present an alternative approach for the development and deployment of a class of predictive biomarkers, leveraging deep learning on digital images of hematoxylin and eosin (H&E)-stained biopsy samples to simultaneously predict a range of molecular factors that are relevant to treatment selection and response. Our framework begins with the training of a pan-solid tumor H&E foundation model, which can generate a universal featurization of H&E-stained tissue images. This featurization becomes the input to machine learning models that perform multi-target, pan-cancer imputation. For a set of 352 drug targets, we show the ability to predict with high accuracy: copy number amplifications, target RNA expression, and an RNA-derived "amplification signature" that captures the transcriptional consequences of an amplification event. We facilitate exploratory analyses by making broad predictions initially. Having identified the subset of biomarkers relevant to a patient population of interest, we develop specialized machine learning models, built on the same foundational featurization, which achieve even higher performance for key biomarkers in tumor types of interest. Moreover, our models are robust, generalizing with minimal loss of performance across different patient populations. By generating imputations from tile-level featurizations, we enable spatial overlays of molecular annotations on top of whole-slide images. These annotation maps provide a clear means of interpreting the histological correlates of our models predictions, and align with features identified by expert pathologist review. Overall, our work demonstrates a flexible and scalable framework for imputing molecular measurements from H&E, providing a generalizable approach to the development and deployment of predictive biomarkers for targeted therapeutics in cancer.
Xu, L.; Kefella, Y.; Zhang, Y.; Conrad, R. D.; Anderson, K. E.; Krysan, K.; Liu, G.; Kane, E.; Pennycuick, A.; Janes, S. M.; Reid, M. E.; Burks, E. J.; Billatos, E.; Mazzilli, S. A.; Kolachalama, V. B.; Beane, J. E.
Show abstract
Molecular and cellular alterations to the normal pseudostratified columnar bronchial epithelium results in the development of bronchial premalignant lesions representing a spectrum of histology from normal to hyperplasia, metaplasia, dysplasia (mild, moderate, and severe), carcinoma in situ and invasive carcinoma. Several studies have identified molecular alterations associated with lesion histology and progression. The broad and continuous spectrum of histologic and molecular changes makes reproducible stratification of lesions across multiple studies challenging. Here we propose a transformer-based framework that flexibly utilizes transcriptomic and histologic patterns to distinguish lesions with bronchial dysplasia or worse from normal, hyperplasia, and metaplasia. We leveraged H&E whole slide images (WSIs) of endobronchial biopsies and bulk gene expression data (GE) from previously published studies and on-going lung precancer atlas efforts obtained from patients as high-risk for lung cancer. Models trained using both WSIs and GE compared to a single data modality had higher performance. On an external testing dataset of WSIs, the area under the ROC curve (AUROC) of the model trained on WSIs plus GE was 0.761{+/-}0.015 compared to 0.690{+/-}0.027 for model trained on WSIs. On external testing datasets of GE, the AUROC of the model trained on WSIs plus GE was 0.890{+/-}0.023 versus 0.816{+/-}0.032 for a model trained on GE. Based on these results, we leveraged data across 4 studies to train a flexible fusion model that allows one or both data modalities to be used in training. The model achieved an AUROC of 0.809{+/-}0.036 on external testing WSIs data and 0.903{+/-}0.022 on external testing GE data. Despite model training on a binary label, model probabilities are associated with histologic grade and the model identifies gene expression alterations associated with bronchial dysplasia across multiple studies. This framework maps bronchial premalignant lesions that contain at least one data modality into a spectrum of disease. In the future, a framework trained on multiple data modalities may be useful in predicting premalignant disease severity, progression, and interception agent efficacy.
Wang, X.; Wang, Y.; Hu, W.; Briggs, M.; Yan, Z.; Hu, J.; Zhang, Y.; Duan, H.; Price, S.; Li, C.
Show abstract
Accurate integration of histological and molecular features is central to modern cancer diagnostics, but it is often hampered by resource-intensive parallel workflows, limited tissues, and increased diagnostic complexity. We present CAMPaS (Cross-modal AI for Integrated Molecular Pathology Diagnosis and Stratification), a clinical AI prototype that addresses challenges in real-world translation of jointly predicting glioma histology, molecular markers, and WHO 2021 integrative diagnoses from hematoxylin and eosin-stained slides. Trained and validated on 3,367 patients (6,043 slides) across eight cohorts (six retrospective, two prospective), CAMPaS achieved high diagnostic performance (AUC 0.895-0.916 in training; 0.946-0.955 in prospective cohorts) and generalized robustly across diverse settings. Its interpretable cross-modal predictions aligned with histopathological annotations and genomic profiles, revealing biologically coherent features. CAMPaS identified histological features for molecular markers, and its clinical utility was validated for enhancing real-world clinical diagnostics. Crucially, CAMPaS stratifies prognosis and treatment response, offering a scalable and biologically grounded solution to accelerate precision oncology.
Chaurasia, A. K.; Toohey, P. W.; Bennett, M. T.; Harris, H. C.; Hewitt, A. W.
Show abstract
BackgroundAccurate molecular profiling and prognostication from routine histopathology slides could transform precision oncology. We developed a Vision Transformer (ViT)-based multi-instance learning (MIL) framework for combined predictions of 32 solid tumour types, TP53 biomarker detection, and survival prediction directly from Whole Slide Images (WSIs). Methods11,060 primary tumours were curated from the TCGA Pan-Cancer Atlas with corresponding somatic mutations, RNA-seq, and clinical outcome data. TP53 alterations were classified as pathogenic drivers using COSMIC and hotspot annotations. WSIs underwent tissue masking, quality control, stain normalisation, and patch extraction (518 x 518) at 6x downsampling. Each patch was encoded by a ViT into a 768-dimensional embedding, which formed a token sequence for a 6-layer Transformer aggregator with learnable classification and positional embeddings. Seven task heads were developed to generate predictions for various outcomes, including cancer type, TP53 mutation status, TP53 RNA expression levels, overall survival (OS), progression-free interval (PFI), and the corresponding times for OS and PFI. The training process had two stages. First, the model was trained on tumour tissue patches from WSIs at five magnifications. In the second stage, it was fine-tuned using patches from all tissue regions with a content-aware strategy, updating all MIL layers for a maximum of 150 epochs at a learning rate of 1 x 10-. The models performance was evaluated on an independent validation set of 1,729 slides using classification metrics, including the area under the receiver operating characteristic curve (AUROC), regression metrics, and Concordance indices (C-index). ResultsThe multi-resolution ViT-based MIL model achieved an AUROC of 0.775 (95% CI: 0.749-0.801) for TP53 mutation detection on the validation set, demonstrating strong overall performance across classification and survival prediction tasks. The fine-tuned model attained robust performance across the tasks, with 0.7569 accuracy for cancer classification, 0.745 AUROC for TP53 mutation detection, C-indices of 0.686 and 0.650 for OS and PFI, and a mean squared error of 1.072 for TP53 RNA expression level estimation. The fine-tuned model attained an accuracy of 65.9% (95% CI: 0.636-0.681) in tumour classification and an AUROC of 0.766 (95% CI: 0.743-0.789) for detecting TP53 mutations on the external validation set. However, most tumour classes, aside from ovarian cancer, reached an AUROC above 0.88 with class-specific thresholding using the Youden Index. This indicates strong generalisation across 32 tumour types, providing reasonable molecular profiling but offering limited prognostic utility in surgical oncology. ConclusionA ViT-based MIL model can simultaneously infer tumour taxonomy, TP53 mutation status, and TP53 RNA expression levels directly from WSIs, with performance comparable to conventional genomic assays, while prognostic risk remains limited. This integrated, slide-level approach offers a scalable pipeline toward computational pathology.
Leyva, A.; Akbar, A.; Niazi, K.
Show abstract
Molecular subtyping of cancer is traditionally defined in transcriptomic space, yet routine clinical deployment is limited by the availability and cost of sequencing. Meanwhile, histopathology captures rich morphological information that is known to correlate with molecular state but lacks a principled, mechanistic bridge to gene-level representations. We propose a graph-constrained learning framework that aligns morphology-derived signals with a fixed, data-driven gene network discovered via hierarchical Monte Carlo screening. We can derive new gene sets for classification using random sampling, and use the coexpression network of that graph to enforce the learning of a pure morphology model without using gene expression. The resulting model performs subtype prediction using morphology alone, while being explicitly forced to operate through a gene-structured latent space. Structural alignment is enforced during training. For Moffitt classification in pancreatic cancer using PANCAN and TCGA datasets, the model has a reported 85% AUC using an alternative gene set network structure, while the alternate gene set itself has an 84% AUC in all patients that were classified with subtyping with pancreatic cancer in the dataset. This demonstrates that virtual transcriptomics can provide biologically grounded molecular insights using only routine histopathology slides, potentially expanding access to precision oncology in resource-limited settings.
Irmisch, A.; Bonilla, X.; Chevrier, S.; Lehmann, K.-V.; Singer, F.; Toussaint, N.; Esposito, C.; Mena, J.; Milani, E. S.; Casanova, R.; Stekhoven, D. J.; Wegmann, R.; Jacob, F.; Sobottka, B.; Goetze, S.; Kuipers, J.; Sarabia del Castillo, J.; Prummer, M.; Tuncel, M.; Menzel, U.; Jacobs, A.; Engler, S.; Sivapatham, S.; Frei, A.; Holtackers, R.; Gut, G.; Ficek, J.; Dummer, R.; Tumor Profiler Consortium, ; Aebersold, R.; Bacac, M.; Beerenwinkel, N.; Beisel, C.; Bodenmiller, B.; Koelzer, V. H.; Moch, H.; Pelkmans, L.; Snijder, B.; Tolnay, M.; Wollscheid, B.; Raetsch, G.; Levesque, M. P.
Show abstract
Recent technological advances allow profiling of tumor samples to an unparalleled level with respect to molecular and spatial composition as well as treatment response. We describe a prospective, observational clinical study performed within the Tumor Profiler (TuPro) Consortium that aims to show the extent to which such comprehensive information leads to advanced mechanistic insights of a patients tumor, enables prognostic and predictive biomarker discovery, and has the potential to support clinical decision making. For this study of melanoma, ovarian carcinoma, and acute myeloid leukemia tumors, in addition to the emerging standard diagnostic approaches of targeted NGS panel sequencing and digital pathology, we perform extensive characterization using the following exploratory technologies: single-cell genomics and transcriptomics, proteotyping, CyTOF, imaging CyTOF, pharmacoscopy, and 4i drug response profiling (4i DRP). In this work, we outline the aims of the TuPro study and present preliminary results on the feasibility of using these technologies in clinical practice showcasing the power of an integrative multi-modal and functional approach for understanding a tumors underlying biology and for clinical decision support.
Yang, J.; Wang, M.; Doenitz, J.; Chapuy, B.; Beissbarth, T.
Show abstract
Identifying and validating genotype-guided drug combinations for a specific molecular subtype in cancer therapy represents an unmet medical need and is important in enhancing efficacy and reducing toxicity. However, the exponential increase in combinatorial possibilities constrains the ability to identify and validate effective drug combinations. In this context, we have developed Onko DrugCombScreen, an innovative tool aiming at advancing precision medicine based on identifying significant drug combination candidates in a target cancer cohort compared to a comparison cohort. Onko DrugCombScreen, inspired by the Molecular Tumor Board (MTB) process, synergizes drug knowledge-base analysis with various statistical methodologies and data visualization techniques to pinpoint drug combination candidates. Validated through a TCGA-BRCA case study, Onko DrugCombScreen has demonstrated its proficiency in discerning established drug combinations in a specific cancer type and in revealing potential novel drug combinations. By enhancing the capability of drug combination discovery through drug knowledge bases, Onko DrugCombScreen represents a significant advancement in personalized cancer treatment by identifying promising drug combinations, setting the stage for the development of more precise and potent combination treatments in cancer care. The Onko DrugCombScreen shiny app is available at https://rshiny.gwdg.de/apps/onko_drugcombscreen/. The Git repository can be accessed at https://gitlab.gwdg.de/MedBioinf/mtb/onko_drugcombscreen.
Shady, M.; Reardon, B.; Jiang, S.; Pimenta, E.; O'Meara, T.; Park, J.; kehl, K. L.; Elmarakeby, H. A.; Sunyaev, S. R.; Van Allen, E. M.
Show abstract
IntroductionPrecision oncology has informed cancer care by enabling the discovery and application of diagnostic, prognostic, and/or predictive molecular biomarkers. However, many patients lack actionable biomarkers or fail to respond to biomarker-directed therapies. Patient similarity approaches can leverage comprehensive tumor profiling and prior clinical experiences from large cohorts for decision support, facilitating broader realization of precision oncology insights. MethodsWe developed a deep learning-based modeling framework using real-world clinicogenomic data from a tertiary cancer center to (i) measure patient similarity based on embedded tumor genomic profiles and (ii) evaluate the association of derived patient subgroups and neighborhoods with shared therapeutic outcomes in breast cancer-specific and histology-agnostic pan-cancer settings. ResultsThe model recovered clinically meaningful patient clusters reflecting both expected and previously unknown therapeutic associations, as well as patient-specific neighborhoods that could inform therapeutic trajectories more often than expected by chance in multiple clinical contexts. Moreover, model utility extended to patients without actionable genomic biomarkers and those with cancer of unknown primary (CUP) diagnoses, where neighborhoods aligned with independently predicted primary cancer type. These neighborhoods could also be examined over time in a continuously learning scenario. ConclusionThis similarity-based modeling framework distilled complex molecular and clinical data into concise, context-specific insights that augment clinician judgment, providing a foundation for a real-time learning, patient-centered decision support model in precision oncology.
Taub, F. E.; Gao, D.
Show abstract
Novel pivotal trial designs, that more clearly demonstrate increased benefit over Standard of Care (SOC), especially in oncology and immuno-oncology (IO), are presented. The benefit of therapy is maximized and sample size is dramatically reduced. The novel methodology includes the introduction of a biochemical "Optimizing Diagnostic", for example a cfDNA test that can detect poor response, when performed early after therapy is begun; this is used to change the therapy of the tested person early in the trial (typically to SOC), before clinical progression. Patients remain in "the new drug first" group, which is compared to SOC. An "Optimizing Diagnostic" is analogous to a "Companion Diagnostic"; both potentially allow approval of drugs that would otherwise fail. A companion diagnostic predicts benefit prior to therapy, the optimizing diagnostic (more accurately) predicts likelihood of benefit after initial therapy. Those patients deemed less likely to respond remain in the novel drug first arm, but are switched to SOC. A Sequential Multiple Assignment Randomized Trial" (SMART) design is proposed to evaluate if switching to SOC or SOC plus continuing the novel therapy is most beneficial. These designs will allow approval of therapy paths including novel agents when the novel agent could not be approved without this design. A good optimizing test may reduce the number of patients needed by 80%, dramatically reducing cost and time; more patients benefit and accrual is easier. Key PointsO_LIA new clinical trial design focused on testing a path that begins with a novel regimen (IO is featured) is presented. C_LIO_LIThe path includes an "optimizing diagnostic" that determines, early during treatment, if a patient should remain on the new regimen. C_LIO_LICompanion diagnostics define the path at a pre-treatment stage, optimizing diagnostics define the path early during treatment. Changes in therapy, typically to the SOC, is made based on the post-test probability of success. C_LIO_LIThe novel path is significantly more likely to lead to approval than the novel regimen alone. C_LIO_LIUse of the novel method can reduce the size of a trial by 80%, and allow approval of the path, when approval of the novel regimen, based on a head-to-head trial vs SOC, would not be possible. C_LI
Tang, T.; Li, A.; Tan, X.; Ji, Q.; Si, L.; Bao, L.
Show abstract
BackgroundPatients with rare cancers face substantial challenges due to limited evidence-based treatment options, resulting from sparse clinical trials. Advances in large language models (LLMs) and recommendation algorithms offer new opportunities to utilize all clinical trial information to improve clinical decisions. MethodsWe used LLM to systematically extract and standardize more than 100,000 cancer trials from ClinicalTrials.gov. Each trial was annotated using a customized scoring system reflecting cancer-treatment interactions based on clinical outcomes and trial attributes. Using this structured data set, we implemented three state-of-the-art collaborative filtering algorithms to recommend potentially effective treatments across different cancer types. ResultsThe LLM-driven data extraction process successfully generated a comprehensive and rigorously curated database from fragmented clinical trial information, covering 78 cancer types and 5,315 distinct interventions. Recommendation models demonstrated high predictive accuracy (cross-validated RMSE: 0.49-0.62) and identified clinically meaningful new treatments for melanoma, independently validated by oncology experts. ConclusionsOur study establishes a proof of concept demonstrating that the combination of LLMs with sophisticated recommendation algorithms can systematically identify novel and clinically plausible cancer treatments. This integrated approach may accelerate the identification of effective therapies for rare cancers, ultimately improving patient outcomes by generating evidence-based treatment recommendations where traditional data sources remain limited.
Ahmad Zafar, S.; Qin, W.; Chengliang, L.; Khan, A. A.; Nazir, A.; Batool, H.; Khalid, F.; Faisal, M. S.
Show abstract
Homologous recombination deficiency (HRD) confers sensitivity to poly (ADP-ribose) polymerase (PARP) inhibitors and platinum-based chemotherapy, representing a critical biomarker for precision oncology across multiple malignancies. Current HRD assessment relies on next-generation sequencing of genomic scar signatures, but specialized infrastructure requirements, high costs, and prolonged turnaround times limit widespread adoption. These barriers restrict access to HRD testing, particularly in resource-constrained settings where the majority of cancer patients receive care. Pan-cancer HRD prediction has been shown, but robustness across histologies and institutions, leak-safe evaluation, and backbone-dependent generalization remain incompletely characterized. Here we show that IHGAMP (Integrative Histopathology-Genomic Analysis for Molecular Phenotyping), a computational framework using vision transformer foundation models, predicts HRD status from H&E images with an AUROC of 0.766 (95% CI 0.727-0.803) on the TCGA held-out test set using OpenCLIP embeddings, and improves to 0.827 with histopathology-pretrained OpenSlideFM embeddings under the same leak-safe protocol. External evaluation on 927 patients (2,718 whole slide images) from seven independent cohorts demonstrated generalization in adenocarcinoma/serous settings (e.g., CPTAC-LUAD AUROC 0.723) and enabled platinum resistance prediction in PTRC-HGSOC (AUROC 0.673), with attenuation in squamous histologies. Systematic comparison of foundation-model embeddings showed that OpenSlideFM outperformed OpenCLIP internally on TCGA (0.827 vs 0.766 AUROC) and improved external generalization in select cohorts (e.g., CPTAC-LUAD), while performance remained attenuated in squamous histologies; TSS-level embedding norm stability across 710 tissue source sites suggested limited site-driven magnitude shifts. Our findings establish that routine histopathology contains morphology associated with HRD that enables moderate, histology-dependent prediction, supporting a potential screening/triage role to prioritize confirmatory molecular testing where appropriate.
Tripathi, A. G.; Waqas, A.; Schabath, M. B.; Yilmaz, Y.; Rasool, G.
Show abstract
HONeYBEE (Harmonized ONcologY Biomedical Embedding Encoder) is an open-source framework that integrates multimodal biomedical data for oncology applications. It processes clinical data (structured and unstructured), whole-slide images, radiology scans, and molecular profiles to generate unified patient-level embeddings using domain-specific foundation models and fusion strategies. These embeddings enable survival prediction, cancer-type classification, patient similarity retrieval, and cohort clustering. Evaluated on 11,400+ patients across 33 cancer types from The Cancer Genome Atlas (TCGA), clinical embeddings showed the strongest single-modality performance with 98.5% classification accuracy and 96.4% precision@10 in patient retrieval. They also achieved the highest survival prediction concordance indices across most cancer types. Multimodal fusion provided complementary benefits for specific cancers, improving overall survival prediction beyond clinical features alone. Comparative evaluation of four large language models revealed that general-purpose models like Qwen3 outperformed specialized medical models for clinical text representation, though task-specific fine-tuning improved performance on heterogeneous data such as pathology reports.
Andani, S.; Chen, B.; Ficek-Pascual, J.; Heinke, S.; Casanova, R.; Sobottka, B.; Bodenmiller, B.; The Tumor Profiler Consortium, ; Kölzer, V. H.; Rätsch, G.
Show abstract
Multiplexed protein imaging offers valuable insights into interactions between tumors and their surrounding tumor microenvironment (TME), but its widespread use is limited by cost, time, and tissue availability. We present HistoPlexer, a deep learning framework that generates spatially resolved protein multiplexes directly from standard hematoxylin and eosin (H&E) histopathology images. HistoPlexer jointly predicts multiple tumor and immune markers using a conditional generative adversarial architecture with custom loss functions designed to ensure pixel- and embedding-level similarity while mitigating slice-to-slice variations. A comprehensive evaluation on metastatic melanoma samples demonstrates that HistoPlexer-generated protein maps closely resemble real maps, as validated by expert assessment. They preserve crucial biological relationships by capturing spatial co-localization patterns among proteins. The spatial distribution of immune infiltration from HistoPlexer-generated protein multiplex enables stratification of tumors into immune subtypes. In an independent cohort, integration of HistoPlexer-derived features into predictive models enhances performance in survival prediction and immune subtype classification compared to models using H&E features alone. To assess broader applicability, we benchmarked HistoPlexer on publicly available pixel-aligned datasets from different cancer types. In all settings, HistoPlexer consistently outperformed baseline methods, demonstrating robustness across diverse tissue types and imaging conditions. By enabling whole-slide protein multiplex generation from routine H&E images, HistoPlexer offers a cost- and time-efficient approach to tumor microenvironment characterization with strong potential to advance precision oncology.
Maitra, C.; Das, V.; Seal, D. B.; De, R. K.
Show abstract
AO_SCPLOWBSTRACTC_SCPLOWLung cancer is characterized by profound intratumoral and inter-patient heterogeneity, spanning histological subtypes, molecular landscapes, and the tumor microenvironment. While multi-omics integration is essential for capturing this complexity, leveraging these data to explicitly define survival-associated subpopulations remains a significant challenge. In this study, we developed NeuroMDAVIS-FS, an unsupervised deep learning framework designed to stratify lung cancer patients by survival risk, and identify molecular determinants underlying improved clinical outcomes. Using the CPTAC cohort, we integrated genomic (CNV), transcriptomic (RNA-seq), and proteomic profiles to extract modality-specific features. Candidate biomarkers were validated through Kaplan- Meier (KM) survival analysis and univariate Cox proportional hazards (CoxPH) regression. A final multivariate CoxPH model effectively stratified patients into high-risk and low-risk cohorts (Kaplan Meier p-value < 0.001). Notably, the integration of these molecular features with baseline clinical models significantly enhanced prognostic accuracy, improving the concordance index by 43.79% in LUAD, 31.05% in LSCC, and 23.76% across the pan-lung cancer cohort. These results demonstrate that NeuroMDAVIS-FS identifies robust, biologically relevant features that surpass traditional clinical variables in predicting patient outcomes, offering a scalable path for precision oncology.
Olsen, L. R.; Odinokov, D.; Holsting, J. Q.; Kondrup, K.; Iisager, L.; Rusan, M.; Buus, S.; Laursen, B. E.; Borre, M.; Jochumsen, M. R.; Bouchelouche, K.; Frydendahl, A.; Rasmussen, M. H.; Henriksen, T. V.; Nesic, M.; Demuth, C.; Lindskrog, S. V.; Nordentoft, I.; Lamy, P.; Therkildsen, C.; Dyrskjot, L.; Sorensen, K. D.; Andersen, C. L.; Skanderup, A. J.; Besenbacher, S.
Show abstract
The fragmentation patterns of whole genome sequenced cell-free DNA are promising features for tumor-agnostic cancer detection. However, systematic biases challenge their cross-cohort generalization. We introduce LIONHEART, a novel, open source cancer detection method specifically optimized to generalize across datasets. The method correlates bias-corrected cfDNA fragment coverage across the genome with the locations of accessible chromatin regions from 898 cell and tissue type features. We use these correlations to detect changes in the cell-free DNA cell type composition caused by cancer. We test LIONHEART on nine datasets and fourteen cancer types (1106 non-cancer controls, 1449 cancers) obtained from different studies and show that it can distinguish cancer samples from non-cancer controls across cohorts with ROC AUC scores ranging from 0.62-0.95 (mean = 0.83, std = 0.12). We further validate the method on an external dataset, achieving a ROC AUC of 0.917.
Zamanitajeddin, N.; Jahanifar, M.; Eastwood, M.; Gunesli, G.; Arends, M. J.; Rajpoot, N.
Show abstract
Microsatellite instability (MSI) is a key biomarker for immunotherapy response and prognosis across multiple cancers, yet its identification from routine Hematoxylin and Eosin (H&E) slides remains challenging. Current deep learning predictors often operate as black-box, weakly supervised models trained on individual slides, limiting interpretability, biological insight, and generalization; particularly in low-data regimes. Importantly, systematic quantitative analysis of shared MSI-associated characteristics across different cancer types has not been performed, representing a major gap in understanding conserved tumor microenvironment (TME) patterns linked to MSI. Here, we present a multi-cancer MSI prediction model that leverages pathology foundation models for robust feature extraction and cell-level social network analysis (SNA) to uncover TME patterns associated with MSI. For the MSI prediction task, we introduce a novel transformer-based embedding aggregation method, leveraging attention-guided, multi-case batch training to improve learning efficiency, stability, and interpretability. Our method achieves high predictive performance, with mean AUROCs of 0.86{+/-}0.06 (colorectal cancer), 0.89{+/-}0.06 (stomach adenocarcinoma), and 0.73{+/-}0.06 (uterine corpus endometrial carcinoma) in internal cross-validation on TCGA dataset and AUROC of 0.99 on external PAIP dataset, outperforming state-of-the-art weakly supervised methods (particularly in AUPRC with an average of 0.65 across three cancers). Multi-cancer training further improved generalization (by 3%) via exposing the model to diverse MSI manifestations, enabling robust learning of transferable, domain-invariant histological patterns. To investigate the TME, we constructed cell graphs from high-attention regions, classifying cells as epithelial, inflammatory, mitotic, or connective, and applied SNA metrics to quantify spatial interactions. Across cancers, MSI tumors exhibited increased epithelial cell density and stronger epithelial-inflammatory connectivity, with subtle, context-dependent changes in stromal organization. These features were consistent across univariate and multivariate analyses and supported by expert pathologist review, suggesting the presence of a conserved MSI-associated microenvironmental phenotype. Our proposed prediction algorithm and SNA-driven interpretation advance MSI prediction and uncover interpretable, biologically meaningful MSI signatures shared across colorectal, gastric, and endometrial cancers.
Hu, Y.; Sirinukunwattana, K.; Li, B.; Gaitskell, K.; Bonnaffe, W.; Wojciechowska, M.; Wood, R.; Alham, N. K.; Malacrino, S.; Woodcock, D.; Verrill, C.; Ahmed, A.; Rittscher, J.
Show abstract
Predicting disease-related molecular traits from histomorphology brings great opportunities for precision medicine. Despite the rich information present in histopathological images, extracting fine-grained molecular features from standard whole slide images (WSI) is non-trivial. The task is further complicated by the lack of annotations for subtyping and contextual histomorphological features that might span multiple scales. This work proposes a novel multiple-instance learning (MIL) framework capable of WSI-based cancer morpho-molecular subtyping across scales. Our method, debuting as Inter-MIL, follows a weakly-supervised scheme. It enables the training of the patch-level encoder for WSI in a task-aware optimisation procedure, a step normally improbable in most existing MIL-based WSI analysis frameworks. We demonstrate that optimising the patch-level encoder is crucial to achieving high-quality fine-grained and tissue-level subtyping results and offers a significant improvement over task-agnostic encoders. Our approach deploys a pseudo-label propagation strategy to update the patch encoder iteratively, allowing discriminative subtype features to be learned. This mechanism also empowers extracting fine-grained attention within image tiles (the small patches), a task largely ignored in most existing weakly supervised-based frameworks. With Inter-MIL, we carried out four challenging cancer molecular subtyping tasks in the context of ovarian, colorectal, lung, and breast cancer. Extensive evaluation results show that Inter-MIL is a robust framework for cancer morpho-molecular subtyping with superior performance compared to several recently proposed methods, even in data-limited scenarios where the number of available training slides is less than 100. The iterative optimisation mechanism of Inter-MIL significantly improves the quality of the image features learned by the patch embedded and generally directs the attention map to areas that better align with experts interpretation, leading to the identification of more reliable histopathology biomarkers.
Wang, X.; Chen, Y.; Liu, X.; Qiu, C.; Tang, H.; Huang, T.; Guo, S.; Ma, S.; Cai, M.; Sun, Q.; Chang, Z.; Liu, J.; Wang, X.; Li, J.; Qian, W.; Wang, B.; Zhang, B.; Bai, C.; Shi, M.; Zhang, X.; Li, M.; Wang, J.; Wang, B.; Ma, J.; Ai, L.; Yu, S.; Wang, L.; Feng, N.; Liu, X.; Yu, G.
Show abstract
The histological heterogeneity of primary tumours across the pan-cancer spectrum poses a formidable barrier to accurate lymph node metastasis assessment, often causing AI systems to make "overconfident errors" on rare variants that lead to missed diagnoses. To address this, we present UPATHLN, a unified diagnostic platform that synergizes a pathology foundation model-based encoder with a decoupled uncertainty estimation mechanism. We developed and validated the system using a large-scale multicentre dataset of 26,229 lymph nodes from 14 distinct primary origins. In internal validation, UPATHLN achieved an area under the curve (AUC) of 0.986. Crucially, the uncertainty module functioned as a decisive fail-safe: by flagging potential false-negative predictions for mandatory pathologist review, it intercepted all missed diagnoses, securing 100% conditional sensitivity across both the development and independent test cohorts--even for tumours from seven unseen primary origins. Concurrently, this mechanism reduced the review burden on negative lymph nodes by 73.2%. Ultimately, UPATHLN sets a new benchmark for safety-critical AI, demonstrating that explicitly modelling uncertainty is key to unlocking reliable, workload-efficient diagnostics at the pan-cancer scale.
Levy, J.; Davis, M.; Chacko, R.; Davis, M.; Fu, L.; Goel, T.; Pamal, A.; Nafi, I.; Angirekula, A.; Christensen, B.; Hayden, M.; Vaickus, L.; LeBoeuf, M.
Show abstract
Successful treatment of solid cancers relies on complete surgical excision of the tumor either for definitive treatment or before adjuvant therapy. Radial sectioning of the resected tumor and surrounding tissue is the most common form of intra-operative and post-operative margin assessment. However, this technique samples only a tiny fraction of the available tissue and therefore may result in incomplete excision of the tumor, increasing the risk of recurrence and distant metastasis and decreasing survival. Repeat procedures, chemotherapy, and other resulting treatments pose significant morbidity, mortality, and fiscal costs for our healthcare system. Mohs Micrographic Surgery (MMS) is used for the removal of basal cell and squamous cell carcinoma utilizing frozen sections for real-time margin assessment while assessing 100% of the peripheral and deep margins, resulting in a recurrence rate of less than one percent. Real-time assessment in many tumor types is constrained by tissue size and complexity and the time to process tissue and evaluate slides while a patient is under general anesthesia. In this study, we developed an artificial intelligence (AI) platform, ArcticAI, which augments the surgical workflow to improve efficiency by reducing rate-limiting steps in tissue preprocessing and histological assessment through automated mapping and orientation of tumor to the surgical specimen. Using basal cell carcinoma (BCC) as a model system, the results demonstrate that ArcticAI can provide effective grossing recommendations, accurately identify tumor on histological sections, map tumor back onto the surgical resection map, and automate pathology report generation resulting in seamless communication between the surgical pathology laboratory and surgeon. AI-augmented-surgical excision workflows may make real-time margin assessment for the excision of more complex and challenging tumor types more accessible, leading to more streamlined and accurate tumor removal while increasing healthcare delivery efficiency.